-
Notifications
You must be signed in to change notification settings - Fork 139
feat: integrate stats transforms into checkpoint data generation #1643
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
DrakeLin
wants to merge
2
commits into
delta-io:main
Choose a base branch
from
DrakeLin:stack/write-stats
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+1,156
−80
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
f80a655 to
f767ced
Compare
This was referenced Jan 21, 2026
f767ced to
fcb7344
Compare
fcb7344 to
de818e5
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1643 +/- ##
==========================================
+ Coverage 84.65% 84.76% +0.11%
==========================================
Files 123 124 +1
Lines 34109 34418 +309
Branches 34109 34418 +309
==========================================
+ Hits 28875 29175 +300
+ Misses 3905 3904 -1
- Partials 1329 1339 +10 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
de818e5 to
dfccba4
Compare
c0d47c1 to
9df9500
Compare
9df9500 to
cd64f79
Compare
DrakeLin
added a commit
that referenced
this pull request
Jan 23, 2026
## 🥞 Stacked PR Use this [link](https://github.com/delta-io/delta-kernel-rs/pull/1645/files) to review incremental changes. - [**stack/null-propagation**](#1645) [[Files changed](https://github.com/delta-io/delta-kernel-rs/pull/1645/files)] - [stack/coalesce](#1648) [[Files changed](https://github.com/delta-io/delta-kernel-rs/pull/1648/files/37e755009566511bf7c2f00e014c1647e77e4533..d64042f7908844ef2d8a1c68312dc3ff936d60dc)] - [stack/checkpoint-transforms](#1646) [[Files changed](https://github.com/delta-io/delta-kernel-rs/pull/1646/files/d64042f7908844ef2d8a1c68312dc3ff936d60dc..4e66ca004f89b23431a96ac106a9c0d400718b10)] - [stack/write-stats](#1643) [[Files changed](https://github.com/delta-io/delta-kernel-rs/pull/1643/files/4e66ca004f89b23431a96ac106a9c0d400718b10..cd64f79fd3b40ebfa811cb333369cb17aa1a2a74)] --------- ## What changes are proposed in this pull request? Fixes a bug in nested transform expression evaluation where null rows in the source struct were losing their null bitmap, causing null structs to incorrectly appear as non-null structs with null fields. When evaluating nested transform expressions (transforms with an input_path that operate on a nested struct), the output StructArray was created with None for the null buffer: `let data = StructArray::try_new(output_fields.into(), output_cols, None)?;` This meant that if the source struct had null rows (e.g., an add action that is null in a checkpoint batch), the output would lose that null information. The struct would appear as non-null but with all-null fields, which is semantically different. ## How was this change tested? Existing transform tests pass. The stats transform integration tests (in a follow-up PR) exercise this code path.
ad7ef99 to
e2ba315
Compare
e2ba315 to
eee1020
Compare
1c451f4 to
74ea04d
Compare
74ea04d to
03a8e93
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
🥞 Stacked PR
Use this link to review incremental changes.
What changes are proposed in this pull request?
Integrates the stats transform module into checkpoint_data() to automatically populate stats and stats_parsed fields based on table configuration when writing checkpoints.
This PR affects the following public APIs
New APIs (non-breaking)
Breaking Changes
CheckpointWriter::checkpoint_data() return type changed
CheckpointWriter::finalize() parameter type changed
Migration
Callers should update to use the new CheckpointDataIterator trait. The trait provides all methods previously available on ActionReconciliationIterator (is_exhausted(), actions_count(), add_actions_count()), plus the new output_schema() method for obtaining the write schema.
How was this change tested?
Integration tests